Welcome to Data Detective!
You've been hired as a Data Detective at the Higher Education Research Institute to analyze training and development data from polytechnic and higher education institutions.
Your mission: Use data analysis to uncover insights that will improve faculty training programs and institutional performance.
This interactive game will teach you how to:
- Clean and prepare data for analysis
- Create and interpret correlation matrices
- Perform regression analysis and understand R-squared
- Identify and address autocorrelation in time series data
- Visualize data and present findings
No prior experience with Excel or Gretl is required!
Module 1: Data Detective Academy
Your first assignment is to clean a messy dataset of faculty training records.
Data Cleaning Basics
Before analyzing data, we need to ensure it's clean and properly formatted. Common data issues include:
- Missing values
- Duplicate entries
- Inconsistent formatting
- Outliers
Your Task:
Identify and fix the following issues in the dataset:
- Find and remove duplicate entries
- Identify missing values and decide how to handle them
- Check for outliers in the Training_Hours column
Use the buttons above to analyze and clean the data.
Knowledge Check (2.5%)
Module 2: The Case of the Missing Correlations
The institute needs to understand relationships between faculty training hours and student satisfaction scores.
Understanding Correlation
Correlation measures the strength and direction of the relationship between two variables:
- Correlation coefficient ranges from -1 to +1
- +1 indicates a perfect positive correlation
- -1 indicates a perfect negative correlation
- 0 indicates no linear correlation
Strong Positive Correlation
r ≈ +0.9
No Correlation
r ≈ 0
Strong Negative Correlation
r ≈ -0.9
Your Task:
Analyze the correlation matrix to identify significant relationships between variables:
- Identify the strongest positive correlation
- Identify the strongest negative correlation
- Determine which training type has the strongest relationship with student satisfaction
Use the buttons above to analyze correlations in the data.
Knowledge Check (2.5%)
Module 3: Regression Investigation
Predict future training needs based on historical data and faculty performance metrics.
Understanding Regression Analysis
Regression analysis helps us understand how changes in independent variables affect a dependent variable:
- Simple regression: One independent variable
- Multiple regression: Two or more independent variables
- R-squared: Measures how well the model explains variation in the dependent variable
Understanding R-squared
R-squared (R²) measures the proportion of variance in the dependent variable explained by the independent variables:
Your Task:
Analyze the regression output to determine which factors most influence student satisfaction:
- Identify which training type has the strongest effect on student satisfaction
- Interpret the R-squared value
- Determine if the overall model is statistically significant
Use the buttons above to analyze the regression model.
Knowledge Check (2.5%)
Module 4: The Time Series Mystery
Analyze 5 years of faculty development data to identify trends and seasonal patterns.
Understanding Time Series Analysis
Time series data has special characteristics that require specific analytical approaches:
- Trends: Long-term movements in the data
- Seasonality: Regular patterns that repeat at fixed intervals
- Autocorrelation: When observations are related to previous observations
Understanding Autocorrelation
Autocorrelation occurs when the error terms in a regression model are correlated over time:
- Durbin-Watson statistic: Measures autocorrelation (values near 2 indicate no autocorrelation)
- Positive autocorrelation: DW < 2 (common in time series data)
- Negative autocorrelation: DW > 2 (less common)
Your Task:
Analyze the time series data and address autocorrelation issues:
- Identify if autocorrelation is present in the model
- Apply a correction method (add lagged variables or use first differences)
- Compare the original and corrected models
Use the buttons above to analyze the time series data.
Knowledge Check (2.5%)
Module 5: The Final Report
Compile findings into a comprehensive report for the institute's board.
Data Visualization and Reporting
Effective data visualization and reporting are crucial for communicating your findings:
- Choose appropriate chart types for different data relationships
- Create clear, labeled visualizations
- Interpret results in context
- Provide actionable recommendations
Training Hours vs. Performance
Training Type Distribution
Training Effectiveness Over Time
Faculty Performance by Experience
Report Structure
1. Executive Summary
Brief overview of key findings and recommendations
2. Data Analysis
Detailed analysis of faculty training data
3. Correlation Findings
Key relationships between variables
4. Regression Results
Factors influencing student satisfaction and faculty performance
5. Time Series Trends
Patterns and projections over time
6. Recommendations
Evidence-based suggestions for training programs
Your Task:
Create a comprehensive report based on your findings from the previous modules:
- Select appropriate visualizations for key findings
- Interpret the results in the context of faculty development
- Provide actionable recommendations for improving training programs
Use the buttons above to create your final report.
Knowledge Check (2.5%)
Congratulations, Data Detective!
Skills Acquired:
- Data Cleaning and Preparation
- Correlation Analysis
- Regression Analysis and R-squared Interpretation
- Time Series Analysis and Autocorrelation
- Data Visualization and Reporting